|
Micro-blog new word discovery method based on improved mutual information and branch entropy
YAO Rongpeng, XU Guoyan, SONG Jian
Journal of Computer Applications
2016, 36 (10):
2772-2776.
DOI: 10.11772/j.issn.1001-9081.2016.10.2772
Aiming at the problem of data sparsity, poor portability and lack of recognition of multiple words (more than three words) in micro-blog new word discovery algorithm, a new word discovery algorithm based on improved Mutual Information (MI) and Branch Entropy (BE), named MBN-Gram, was proposed. Firstly, the N-Gram was used to extract the candidate terms of new words, and the rules of using frequency and stop words were used to filter the candidates. Then the improved MI and BE were used to expand and filter the candidates again. Finally, the corresponding dictionary was used to screen, so as to get new words. Theoretical and experimental analysis show that the accuracy rate, recall rate and
F value of MBN-Gram algorithm were improved. Experimental results shows that the MBN-Gram algorithm is effective and feasible.
Reference |
Related Articles |
Metrics
|
|